68 research outputs found
TamilTB: An Effort Towards Building a Dependency Treebank for Tamil
Annotated corpora such as treebanks are important for the development
of parsers, language applications as well as understanding of the language itself.
Only very few languages possess these scarce resources. In this paper, we describe
our effort in syntactically annotating a small corpora (600 sentences) of Tamil
language. Our annotation is similar to Prague Dependency Treebank (PDT 2.0)
and consists of 2 levels or layers: (i) morphological layer (m-layer) and (ii) analytical
layer (a-layer). For both the layers, we introduce annotation schemes i.e. positional
tagging for m-layer and dependency relations (and how dependency structures
should be drawn) for a-layers. Finally, we evaluate our corpora in the tagging and
parsing task using well known taggers and parsers and discuss some general issues
in annotation for Tamil language
Improvements to Korektor: A Case Study with Native and Non-Native Czech
Abstract: We present recent developments of Korektor, a statistical spell checking system. In addition to lexicon, Korektor uses language models to find real-word errors, detectable only in context. The models and error probabilities, learned from error corpora, are also used to suggest the most likely corrections. Korektor was originally trained on a small error corpus and used language models extracted from an in-house corpus WebColl. We show two recent improvements: • We built new language models from freely available (shuffled) versions of the Czech National Corpus and show that these perform consistently better on texts produced both by native speakers and nonnative learners of Czech. • We trained new error models on a manually annotated learner corpus and show that they perform better than the standard error model (in error detection) not only for the learners' texts, but also for our standard evaluation data of native Czech. For error correction, the standard error model outperformed non-native models in 2 out of 3 test datasets. We discuss reasons for this not-quite-intuitive improvement. Based on these findings and on an analysis of errors in both native and learners' Czech, we propose directions for further improvements of Korektor
Foliar application of Ascophyllum nodosum on improvement of photosynthesis, fruit setting percentage, yield and quality of tomato (Solanum lycopersicum L.)
In recent days, liquid formulations of brown seaweed extract, Ascophyllum nodosum used as a biostimulant in agriculture. Various studies suggest that A. nodosum enhanced the growth and yield of agriculturally important crops, but still, there is a lack of information about the biostimulation effects on photosynthesis, flowering and fruit setting of tomato. Hence, the present study aimed to know the effect of foliar application of A. nodosum on photosynthesis, flowering, fruit setting, yield and quality of tomato. A biostimulant product, MC Set with A. nodosum extract applied to tomato as a foliar spray at rates of three different concentrations such as 1.0 L ha−1 (MS 1), 2.0 L ha−1 (MS 2), 3.0 L ha−1 (MS 3) for six times during flowering of 2nd (30 Days after transplanting – DAT), 3rd (40 DAT) and 4th (50 DAT) cluster and fruit setting of 2nd (60 DAT), 3rd (70 DAT) and 4th (80 DAT) cluster respectively. The MC Set treatments enhanced the plant photosynthesis, flower number and fruit number per cluster, yield and quality traits of tomato. However, the middle concentration MS 2 showed highest photosynthetic rate, stomatal conductance, SPAD value, flower and fruit in 2nd, 3rd and 4th cluster. It also had better average fruit weight and yield per plant and hectare and enhanced the quality parameters such as total soluble solids, ascorbic acid content, lycopene and total sugars compared to control and other two concentrations of MS Set. Hence, using A. nodosum extract on tomato growth could be a better sustainable crop production method.
Understanding the molecular basis of plant growth promotional effect of Pseudomonas fluorescens on rice through protein profiling
<p>Abstract</p> <p>Background</p> <p>Plant Growth Promoting Rhizobacteria (PGPR), <it>Pseudomonas fluorescens </it>strain KH-1 was found to exhibit plant growth promotional activity in rice under both <it>in-vitro </it>and <it>in-vivo </it>conditions. But the mechanism underlying such promotional activity of <it>P. fluorescens </it>is not yet understood clearly. In this study, efforts were made to elucidate the molecular responses of rice plants to <it>P. fluorescens </it>treatment through protein profiling. Two-dimensional polyacrylamide gel electrophoresis strategy was adopted to identify the PGPR responsive proteins and the differentially expressed proteins were analyzed by mass spectrometry.</p> <p>Results</p> <p>Priming of <it>P. fluorescens</it>, 23 different proteins found to be differentially expressed in rice leaf sheaths and MS analysis revealed the differential expression of some important proteins namely putative p23 co-chaperone, Thioredoxin h- rice, Ribulose-bisphosphate carboxylase large chain precursor, Nucleotide diPhosphate kinase, Proteosome sub unit protein and putative glutathione S-transferase protein.</p> <p>Conclusion</p> <p>Functional analyses of the differential proteins were reported to be directly or indirectly involved in growth promotion in plants. Thus, this study confirms the primary role of PGPR strain KH-1 in rice plant growth promotion.</p
A high-throughput regeneration protocol for recalcitrant tropical Indian maize (Zea mays L) inbreds
Immature embryos from five select recalcitrant maize (Zea mays L) inbreds used as explants were evaluated for their ability to form callus, somatic embryos and subsequent regeneration into plants. The embryos were placed on N6 basal media with varying levels of 2,4-D (0.5, 1.0, 1.5, 2.0, and 2.5 mg l-1) and were regenerated on MS me¬dium supplemented with BAP (2 - 10 mg l-1), 2,4-D (0.25 mg l-1) and silver nitrate (0.85 mg l-1). Explants cultured on N6 medium supplemented with 2,4-D (2.0 mg l-1) were associated with the highest frequency of embryogenic calli and that of UMI 29 were highly embryogenic (78.67%). When synergism between dicamba and 2,4-D on Type II callus production in UMI 29 was sought to be investigated using 2,4-D (1 or 2 mg l-1) individually and in combina¬tion with dicamba (3.7 mg l-1) production of Type II callus with the greatest frequency of 83.33% was observed on N6 medium containing 3.7 mg l-1 dicamba + 1 mg l-1 2,4-D. The greatest percentage of shoot induction (82.67%) was observed on MS medium supplemented with BAP (10 mg l-1). Among the five genotypes tested, UMI 29 was associated with the highest percentage of callus initiation, shoot induction and mean number of developed shoots. The protocol described in this study can reliably be used to transform tropical maize inbreds as a routine
Revisiting Low Resource Status of Indian Languages in Machine Translation
Indian language machine translation performance is hampered due to the lack
of large scale multi-lingual sentence aligned corpora and robust benchmarks.
Through this paper, we provide and analyse an automated framework to obtain
such a corpus for Indian language neural machine translation (NMT) systems. Our
pipeline consists of a baseline NMT system, a retrieval module, and an
alignment module that is used to work with publicly available websites such as
press releases by the government. The main contribution towards this effort is
to obtain an incremental method that uses the above pipeline to iteratively
improve the size of the corpus as well as improve each of the components of our
system. Through our work, we also evaluate the design choices such as the
choice of pivoting language and the effect of iterative incremental increase in
corpus size. Our work in addition to providing an automated framework also
results in generating a relatively larger corpus as compared to existing
corpora that are available for Indian languages. This corpus helps us obtain
substantially improved results on the publicly available WAT evaluation
benchmark and other standard evaluation benchmarks.Comment: 10 pages, few figures, Preprint under revie
Relatório de estágio em farmácia comunitária
Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr
TamilTB - An Effort Towards Building a Treebank for Tamil
This talk is aimed at presenting our ongoing effort to build a PDT style dependency treebank for Tamil language. The talk will outline the annotation scheme and annotation at morphological and surface syntax layers. Various issues such as ambiguous structures, NP compounding, coordination phenomena and clitics with respect to the treebank annotation will be discussed. Our ultimate goal in this project is to develop a feature rich parsing framework for Tamil, thus we also present the results we obtained in automatic parsing (rule based & corpus based) using the developed resources. Some problematic issues in Tamil parsing will also be discussed
Parsing under-resourced languages: Cross-lingual transfer strategies for Indian languages
Key to fast adaptation of language technologies for any language hinges on the availability of fundamental tools and resources such as monolingual/parallel corpora, annotated corpora, part-of-speech (POS) taggers, parsers and so on. The languages which lack those fundamental resources are often referred as under-resourced
languages.
In this thesis, we address the problem of cross-lingual dependency parsing of under-resourced languages. We apply three methodologies to induce dependency structures: (i) projecting dependencies from a resource-rich language to under-resourced languages via parallel corpus word alignment links (ii) parsing under-
resourced languages using parsers whose models are trained on treebanks of other
languages, and do not look at actual word forms, but only on POS categories. Here
we address the problem of incompatibilities in annotation styles between source side parsers and target side evaluation treebanks by harmonizing annotations to a common standard; and finally (iii) we add a new under-resourced scenario in which we use machine translated parallel corpora instead of human translated corpora for
projecting dependencies to under-resourced languages.
We apply the aforementioned methodologies to five Indian languages (ILs): Hindi, Urdu, Telugu, Bengali and Tamil (in the order of high to low availability of treebank data). To make the evaluation possible for Tamil, we develop a depen
dency treebank resource for Tamil from scratch and we use the created data in
evaluation and as a source in parsing other ILs. Finally, we list out strategies that
can be used to obtain dependency structures for target languages under different
resource-poor scenarios
- …